{ggvfields}

Principled Tools for Modeling and Visualizing 2D Vector Fields

ggvfields

Reintroduce the problem

Vector fields

ggvfields can be understood by analogy with core ggplot2 concepts

Real-valued functions of real arguments \(f: \mathbb{R}\to \mathbb{R}\) are familiar from calculus

Their graphs are fundamental to their understanding (eg. \(f(x) = \sin(x)\))

f <- function(x) sin(x)
ggplot() +
  geom_function(fun = f, xlim = c(-pi, pi))

Vector fields

ggvfields can be understood by analogy with core ggplot2 concepts

Vector fields are vector-valued functions of vector arguments \(f: \mathbb{R}^{n} \to \mathbb{R}^{m}\)

In this talk we’ll assume \(n = m = 2\), a common case in applications

Thought of in various ways

\(\displaystyle{\textbf{f}(x,y) = \textbf{f}\left(\left[\begin{matrix}x \\ y \end{matrix}\right]\right) = \left[\begin{matrix}f_{1}(x,y) \\ f_{2}(x,y) \end{matrix}\right]}\) “component functions”

\[\textbf{x} \stackrel{\textbf{f}}{\longmapsto} \textbf{y} \in \mathbb{R}^{2}\]

Examples:

\(\displaystyle{\textbf{f}(x,y) = \textbf{f}\left(\left[\begin{matrix}x \\ y \end{matrix}\right]\right) = \left[\begin{matrix}-y \\ x \end{matrix}\right]}\)

\(\displaystyle{\nabla \ell(\mu,\sigma^{2}) = \left[\begin{matrix}\frac{\partial}{\partial\mu} \ell(\mu,\sigma^{2}) \\ \frac{\partial}{\partial\sigma^{2}} \ell(\mu,\sigma^{2}) \end{matrix}\right]}\)

Current R solutions

f <- function(u) {
  x <- u[1]; y <- u[2]
  c(-y, x)
}

N <- 11
df <- expand.grid( "x" = seq(-1, 1, len = N), "y" = seq(-1, 1, len = N) )
df$fy <- df$fx <- NA

for (i in 1:nrow(df)) df[i,3:4] <- f( as.numeric(df[i,1:2]) )

df <- transform(df, "radius" = sqrt(fx^2 + fy^2), "angle" = atan2(fy, fx))

head(df)
#      x  y fx   fy   radius      angle
# 1 -1.0 -1  1 -1.0 1.414214 -0.7853982
# 2 -0.8 -1  1 -0.8 1.280625 -0.6747409
# 3 -0.6 -1  1 -0.6 1.166190 -0.5404195
# 4 -0.4 -1  1 -0.4 1.077033 -0.3805064
# 5 -0.2 -1  1 -0.2 1.019804 -0.1973956
# 6  0.0 -1  1  0.0 1.000000  0.0000000

Current R solutions

head(df)
#      x  y fx   fy   radius      angle
# 1 -1.0 -1  1 -1.0 1.414214 -0.7853982
# 2 -0.8 -1  1 -0.8 1.280625 -0.6747409
# 3 -0.6 -1  1 -0.6 1.166190 -0.5404195
# 4 -0.4 -1  1 -0.4 1.077033 -0.3805064
# 5 -0.2 -1  1 -0.2 1.019804 -0.1973956
# 6  0.0 -1  1  0.0 1.000000  0.0000000

base R

with(df, {
  plot(NULL,
    xlab = "x", ylab = "y",
    xlim = range(x, x + fx), 
    ylim = range(y, y + fy)
  )
  arrows(x, y, x+fx, y+fy, length = .05)
})

ggplot2

ggplot(df, aes(x = x, y = y)) +
  geom_segment(
    aes(xend = x + fx, yend = y + fy),
    arrow = arrow(length = unit(0.1, "cm"))
  )

Current R solutions

ggquiver

library("ggquiver")

ggplot(df, aes(x = x, y = y)) +
  geom_quiver(aes(u = -y, v = x))

metR

library("metR")

ggplot(df, aes(x = x, y = y)) +
  geom_streamline(aes(dx = fx, dy = fy))  

Current R solutions

ggfields

library("ggfields")
ggplot(df, aes(x = x, y = y)) +
  ggfields::geom_fields(aes(angle = angle, radius = radius))

Current Python solutions

import numpy as np
import matplotlib.pyplot as plt

# Create a grid of points
x = np.linspace(-10, 10, 20)
y = np.linspace(-10, 10, 20)
X, Y = np.meshgrid(x, y)

# Define the vector field
U = -Y
V = X

# Plot the vector field
plt.quiver(X, Y, U, V)
plt.show()

A vector field visualization

Current Mathematica solutions

f[x_, y_] := {-y, x}
VectorPlot[f[x, y], {x, -10, 10}, {y, -10, 10}]

Mathematica 11.3

Mathematica 13.1

Visualizing vector fields


Challenges with results

Unattractive results Could be more visually appealing

Not informative Lacks clarity or detail


Challenges with code

Verbose syntax Very clunky and wordy

High user burden Lots of effort and mental load

Difficult to customize Hard to see now; examples to come

Not extensible Faceting? Superimposing different colors? etc.

Ridged Accept data one way


The results should be…

Correct Does what it claims to do

Clear Easily understandable

Attractive Visually appealing


The code should be…

Simple Straightforward and concise

Familiar Easy for a ggplot user

Flexible Accept functions or data of any form

Solution - ggvfields


Install from CRAN

install.packages("ggvfields")

Install from Github

remotes::install_github("dusty-turner/ggvfields")

Load the package

library("ggvfields")

ggvfields

How should we think about this:

  1. Data Inputs:
    Users will come with both vector data and stream data:
    • x, y, fx, fy
    • x, y, xend, yend
    • x, y, angle, direction
    • x, y, angle_deg, direction
  2. Visualization Types:
    Users will want both vector fields and stream fields.

Points to Ponder

  1. Vectors are special cases of streams.
  2. Streams and vectors are special cases of streamfields and vector fields.

Having said that…

ggvfields is a stream visualization package.

Streams (and vectors) often arise from ODEs

Consider the vector field

\[ \textbf{f}(x,y) = (-y,\,x) \]

which yields the ODE system

\[ \frac{dx}{dt} = -y,\quad \frac{dy}{dt} = x \]

Euler’s method estimates the ODE solution by

\[ \mathbf{v}_{n+1} = \mathbf{v}_n + \Delta t\,\textbf{f}(\mathbf{v}_n) \]

Starting at \(\mathbf{v}_0 = (1,1)\) and using \(\Delta t = 0.1\):

  • Iteration 1:
    \(\textbf{f}(1,1) = (-1,\,1)\), so
    \(\textbf{v}_1 = (1,1) + 0.1\,(-1,1) = (0.9,\,1.1)\).

  • Iteration 2:
    \(\textbf{f}(0.9,1.1) = (-1.1,\,0.9)\), so
    \(\textbf{v}_2 = (0.9,1.1) + 0.1\,(-1.1,0.9) = (0.9-0.11,\,1.1+0.09) = (0.79,\,1.19)\).

library("deSolve")
f_ode <- function(t, state, parms) list( f(state) )
# Set the initial state and define time sequence
init <- c(x = 1, y = 1)
times <- seq(0, 1, by = 0.1)
# Integrate the system using ode
(stream <- ode(init, times, f_ode, parms = NULL, method = "euler"))
#    time           x        y
# 1   0.0  1.00000000 1.000000
# 2   0.1  0.90000000 1.100000
# 3   0.2  0.79000000 1.190000
# 4   0.3  0.67100000 1.269000
# 5   0.4  0.54410000 1.336100
# 6   0.5  0.41049000 1.390510
# 7   0.6  0.27143900 1.431559
# 8   0.7  0.12828310 1.458703
# 9   0.8 -0.01758719 1.471531
# 10  0.9 -0.16474031 1.469772
# 11  1.0 -0.31171756 1.453298

Stream Data

ggplot(stream, aes(x = x, y = y)) +
  geom_path(arrow = arrow()) + ## note: could put labels here
  geom_point()

Stream Data

ggplot(stream, aes(x = x, y = y, t = time)) +
  geom_stream()

Vector Data

Reminder: A vector is a special case of a stream

init <- c(x = 1, y = 1); times <- seq(0, 1, by = 1)

vector <- 
  data.frame(ode(init, times, f_ode, parms = NULL, method = "euler")) |> 
  pivot_wider(names_from = time, values_from = c(x, y)) |> 
  mutate(fx = x_1 - x_0, fy = y_1 - y_0, angle = atan2(fy, fx), angle_deg = angle * 180 / pi,distance = sqrt(fx^2 + fy^2))
ggplot(vector, aes(x = x_0, y = y_0)) +
  geom_segment(aes(xend = x_1, yend = y_1), 
               arrow = arrow())

ggplot(vector, aes(x = x_0, y = y_0)) + ## note:  why do i have to do this? I unloaded ggfields
  geom_vector(aes(xend = x_1, yend = y_1), 
                         normalize = FALSE, center = FALSE)

ggplot(vector, aes(x = x_0, y = y_0)) +
  geom_vector(aes(xend = x_1, yend = y_1), normalize = FALSE, center = FALSE)

ggplot(vector, aes(x = x_0, y = y_0)) + ## note:  why do i have to do this? I unloaded ggfields
  geom_vector(aes(fx = fx, fy = fy), normalize = FALSE, center = FALSE)

ggplot(vector, aes(x = x_0, y = y_0)) + ## note:  why do i have to do this? I unloaded ggfields
  geom_vector(aes(angle = angle, distance = distance), normalize = FALSE, center = FALSE)

ggplot(vector, aes(x = x_0, y = y_0)) + ## note:  why do i have to do this? I unloaded ggfields
  geom_vector(aes(angle_deg = angle_deg, distance = distance), normalize = FALSE, center = FALSE)

What if we have a lot of streams?

generate_path <- function(u, step) {
  init <- unname(c(x = u[1], y = u[2])); times <- seq(0, 1, by = step)
  df <- data.frame(ode(init, times, f_ode, parms = NULL, method = "euler"))
  colnames(df) <- c("time", "x", "y")
  df
}

streams <-
  expand.grid(x = seq(-1,1,.5), y = seq(-1,1,.5)) |> 
  apply(1, generate_path, step = .1) |> bind_rows(.id = "id") 

head(streams)
#   id time        x        y
# 1  1  0.0 -1.00000 -1.00000
# 2  1  0.1 -0.90000 -1.10000
# 3  1  0.2 -0.79000 -1.19000
# 4  1  0.3 -0.67100 -1.26900
# 5  1  0.4 -0.54410 -1.33610
# 6  1  0.5 -0.41049 -1.39051

What if we have a lot of streams?

ggplot(streams) + 
  geom_stream(aes(x = x, y = y, t = time, group = id))

What if we have a lot of vectors? Vectors are special cases of streams.

generate_path <- function(u, step) {
  init <- unname(c(x = u[1], y = u[2])); times <- seq(0, 1, by = step)
  df <- data.frame(ode(init, times, f_ode, parms = NULL, method = "euler"))
  colnames(df) <- c("time", "x", "y")
  df
}

vectors <-
  expand.grid(x = seq(-1,1,.5), y = seq(-1,1,.5)) |> 
  apply(1, generate_path, step = 1) |> bind_rows(.id = "id") 

head(vectors)
#   id time    x    y
# 1  1    0 -1.0 -1.0
# 2  1    1  0.0 -2.0
# 3  2    0 -0.5 -1.0
# 4  2    1  0.5 -1.5
# 5  3    0  0.0 -1.0
# 6  3    1  1.0 -1.0

What if we had a lot of vectors?

ggplot(vectors) + 
  geom_stream(aes(x = x, y = y, t = time, group = id))

Stream Fields

f <- function(u) {
  x <- u[1]; y <- u[2]
  c(-y, x)
}

ggplot() + geom_stream_field(fun = f)

Vector Fields

Reminder: A vector is a special case of a stream

f <- function(u) {
  x <- u[1]; y <- u[2]
  c(-y, x)
}

ggplot() + geom_stream_field(fun = f, type = "vector")

Vector Fields

Reminder: A vector is a special case of a stream

f <- function(u) {
  x <- u[1]; y <- u[2]
  c(-y, x)
}

ggplot() + geom_vector_field(fun = f)

Vector and Stream Fields

f <- function(u) {
  x <- u[1]; y <- u[2]
  c(-y, x)
}

ggplot() + 
  geom_vector_field(fun = f, n = 3, color = "green") + 
  geom_stream_field(fun = f, n = 3, color = "blue")

Stream Options: center = TRUE

Recall the ODE system: \[ \textbf{f}(x,y) = (-y,\,x) \quad \text{(i.e., } \frac{dx}{dt}=-y,\; \frac{dy}{dt}=x\text{)} \]

Euler’s method estimates the solution: \[ \mathbf{v}_{n+1} = \mathbf{v}_n + \Delta t\,\mathbf{f}(\mathbf{v}_n) \]

center = FALSE:
\(\Delta t\): from \(0\) to \(T\)

center = TRUE:
\(+ \Delta t\): from \(0\) to \(T/2\)
\(- \Delta t\): from \(0\) to \(-T/2\)

Stream Options: center = TRUE

center_backward <- ode(init, times = seq(0, -0.5, by = -0.1), f_ode, parms = NULL, method = "euler")
center_backward <- center_backward[nrow(center_backward):1, ]

center_forward <- ode(init, times = seq(0.1, 0.5, by = 0.1), f_ode, parms = NULL, method = "euler")

(center <- rbind(center_backward, center_forward))
#       time       x       y
#  [1,] -0.5 1.39051 0.41049
#  [2,] -0.4 1.33610 0.54410
#  [3,] -0.3 1.26900 0.67100
#  [4,] -0.2 1.19000 0.79000
#  [5,] -0.1 1.10000 0.90000
#  [6,]  0.0 1.00000 1.00000
#  [7,]  0.1 1.00000 1.00000
#  [8,]  0.2 0.90000 1.10000
#  [9,]  0.3 0.79000 1.19000
# [10,]  0.4 0.67100 1.26900
# [11,]  0.5 0.54410 1.33610

Stream Options: center = TRUE

ggplot() +
  geom_stream(aes(x = x, y = y, t = time), center, linewidth = 7, color = "orange") +
  geom_stream(aes(x = x, y = y, t = time), stream, linewidth = 3, color = "blue") + 
  lims(x = c(0, 1.75), y = c(0, 1.75))

Stream Options: center = TRUE

ggplot() +
  geom_stream_field(fun = f, n = 3, tail_point = TRUE, center = FALSE, linewidth = 7, color = "orange") +
  geom_stream_field(fun = f, n = 3, tail_point = TRUE, linewidth = 3, color = "blue")

Stream Options: L

p1 <- ggplot() + geom_stream_field(fun = f, n = 4) 
p2 <- ggplot() + geom_stream_field(fun = f, n = 4, L = .3) 
p3 <- ggplot() + geom_stream_field(fun = f, n = 4, L = 3) 

p1 + p2 + p3 + coord_equal() + plot_layout(guides = "collect")

Stream Options: normalize = FALSE

For normalize = TRUE:
- Default: Streams grow to length L provided by user or determined allegorically.
For normalize = FALSE:
- Default: Streams grow to length L, then all are trimmed to the fastest time T.
- Customization: Set T to control stream duration.

p1 <- ggplot() + geom_stream_field(fun = f, n = 4, normalize = FALSE) 
p2 <- ggplot() + geom_stream_field(fun = f, n = 4, normalize = FALSE, L = 0.3) 
p3 <- ggplot() + geom_stream_field(fun = f, n = 4, normalize = FALSE, T = 0.3) 
p1 + p2 + p3 + coord_equal() + plot_layout(guides = "collect")

Vector Options: center = FALSE

Without centering, an arrow is drawn from \(f(x,y)\) to \(f(x,y) + f(x,y)\)

When centered, an arrow is drawn from \(f(x,y) - \frac{1}{2} f(x,y)\) to \(f(x,y) + \frac{1}{2} f(x,y).\)

ggplot() +
  geom_vector_field(fun = f, n = 4, tail_point = TRUE, center = FALSE, linewidth = 7, color = "orange") +
  geom_vector_field(fun = f, n = 4, tail_point = TRUE, linewidth = 3, color = "blue")

Vector Options: L

When normalize = TRUE: Vectors are scaled to a fixed length \(L\) (either user-specified or computed automatically).

p1 <- ggplot() + geom_vector_field(fun = f, n = 4) 
p2 <- ggplot() + geom_vector_field(fun = f, n = 4, L = .3)
p3 <- ggplot() + geom_vector_field(fun = f, n = 4, L = 3)

p1 + p2 + p3 + coord_equal() + plot_layout(guides = "collect")

Vector Options: normalize = FALSE

When normalize = TRUE: Vectors are scaled to a fixed length \(L\) (either user-specified or computed automatically).

When normalize = FALSE: Vectors retain their natural magnitude—that is, they start at \(f(x,y)\) and extend to \(f(x,y) + f(x,y)\) (with \(\Delta T = 1\)).

p1 <- ggplot() + geom_vector_field(fun = f, n = 4) 
p2 <- ggplot() + geom_vector_field(fun = f, n = 4, normalize = FALSE)

p1 + p2 + coord_equal() + plot_layout(guides = "collect")

Gradients as Stream Fields

Consider the scalar function

\[ \phi(x,y) = \frac{1}{2}\left(x^2 + y^2\right) \]

Its gradient is a vector field is given by

\[ \nabla\phi(x,y) = \left(\frac{\partial \phi}{\partial x},\,\frac{\partial \phi}{\partial y}\right) = \left(x,\,y\right) \]

f <- function(u) {
  x <- u[1]; y <- u[2]
  (1/2) * (x^2 + y^2)
}
p1 <- ggplot() + ggfunction::geom_function_2d_1d(fun = f) ## sneak peak of chapter 3
p2 <- ggplot() + geom_gradient_field(fun = f)
p3 <- ggplot() + ggfunction::geom_function_2d_1d(fun = f) + geom_gradient_field(fun = f) 
p1 + p2 + p3 + coord_equal() + plot_layout(guides = "collect")

From Vector Field to Scalar Field

Consider the scalar function

\[ \phi(x,y) = \frac{1}{2}\left(x^2 + y^2\right) \]

Its gradient is a vector field is given by

\[ \nabla\phi(x,y) = \left(\frac{\partial \phi}{\partial x},\,\frac{\partial \phi}{\partial y}\right) = \left(x,\,y\right) \]


Express the potential function as a line integral:

\[ \phi(x,y)=\int_{(x_0,y_0)}^{(x,y)} \nabla \phi \cdot (dx,dy). \]

Integrate the x-component: \[ \phi(x,y)=\int x\,dx = \frac{1}{2}x^2 + g(y) \]

Differentiate with respect to y: \[ \frac{\partial}{\partial y}\phi(x,y)=\frac{\partial\phi}{\partial y}\left(\frac{1}{2}x^2+g(y)\right) = \frac{\partial\phi}{\partial y}g(y) = \frac{\partial \phi}{\partial y}=g'(y) = y \]

Solve for g(y) by integrating: \[ g(y) = \int g'(y)\,dy = \int y\,dy = \frac{1}{2}y^2 + C \]

So: \[ \phi(x,y) = \frac{1}{2}x^2+g(y) = \frac{1}{2}x^2+\frac{1}{2}y^2+C \]

From Vector Field to Scalar Field

Consider the scalar function

\[ \phi(x,y) = \frac{1}{2}\left(x^2 + y^2\right) \]

Its gradient is a vector field is given by

\[ \nabla\phi(x,y) = \left(\frac{\partial \phi}{\partial x},\,\frac{\partial \phi}{\partial y}\right) = \left(x,\,y\right) \]

grad_f <- function(u) {
  x <- u[1]
  y <- u[2]
  c(x, y)
}

p1 <- ggplot() + geom_potential(fun = grad_f)
p2 <- ggplot() + geom_vector_field(fun = grad_f)
p3 <- ggplot() + geom_potential(fun = grad_f) + geom_vector_field(fun = grad_f)
p1 + p2 + p3 + coord_equal() + plot_layout(guides = "collect")

Random Vector Data

ggplot(rand_dat |> print(n = 6)) + 
  geom_vector(aes(x = x, y = y, fx = fx, fy = fy), normalize = FALSE)
# # A tibble: 100 × 4
#       x     y       fx     fy
#   <dbl> <dbl>    <dbl>  <dbl>
# 1 -98.4  24.9  6.71     -3.85
# 2 -91.6  30.1 -4.42     37.7 
# 3 -91.1  30.1  0.00252  34.3 
# 4 -96.7  31.6 21.2     -20.4 
# 5 -98.5  24.8  7.02     -3.46
# 6 -99.8  33.2  0         0   
# # ℹ 94 more rows

Random Vector Data

ggplot(rand_dat) + 
  geom_vector(aes(x = x, y = y, fx = fx, fy = fy))

Random Vector Data

ggplot(rand_dat) + 
  geom_vector(aes(x = x, y = y, fx = fx, fy = fy))

Multivariate Reggsion

\(\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E},\)

where

  • \(\mathbf{Y}\) is the response matrix (with columns \(f_x\) and \(f_y\)),
  • \(\mathbf{X}\) is the design matrix (with columns \(1\), \(x\), \(y\), and \(xy\)),
  • \(\mathbf{B}\) contains the regression coefficients, and
  • \(\mathbf{E}\) is the error matrix.

The least squares estimator is

\(\widehat{\mathbf{B}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{Y}.\)

Multivariate Reggsion

ggplot(rand_dat, aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector_smooth(eval_points = data.frame(x = -90.8, y = 26.25), method = "lm", se = FALSE) +
  geom_vector(alpha = .3, color = "black") 

Multivariate Reggsion

\[ \boldsymbol{\Sigma}_{\widehat{\mathbf{v}}} = \mathbf{X}_{\text{new}}\,\mathbf{V}\,\mathbf{X}_{\text{new}}^\top + \boldsymbol{\Sigma}, \quad \mathbf{V} = \sigma^2 (\mathbf{X}^\top \mathbf{X})^{-1}. \] Eigen-decomposition:

\[ \boldsymbol{\Sigma}_{\widehat{\mathbf{v}}} = \mathbf{Q}\,\boldsymbol{\Lambda}\,\mathbf{Q}^\top, \quad \boldsymbol{\Lambda} = \operatorname{diag}(\lambda_1,\lambda_2). \]

Semi-axis lengths and orientation:

\[ a_i = \sqrt{\lambda_i\,\chi^2_{1-\alpha}}, \quad i=1,2, \qquad \theta = \arctan\left(\frac{q_{2,1}}{q_{1,1}}\right). \]

Multivariate Reggsion

ggplot(rand_dat, aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector_smooth(eval_points = data.frame(x = -90.8, y = 26.25), method = "lm", se = TRUE) +
  geom_vector(alpha = .3, color = "black") 

Multivariate Reggsion

ggplot(rand_dat, aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector_smooth(method = "lm") +
  geom_vector(alpha = .3, color = "black") 

Multivariate Reggsion

ggplot(rand_dat, aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector_smooth(method = "lm") +
  geom_vector(alpha = .3, color = "black") 

Multivariate Reggsion

\[ r = \sqrt{f_x^2 + f_y^2}, \quad \theta = \operatorname{atan2}(f_y, f_x), \] with Jacobian determinant \(|r|\). The joint density becomes: \[ f_{r,\theta}(r,\theta) = f_{f_x,f_y}(r\cos\theta,\,r\sin\theta)\,|r|. \] Marginalize over r to get angular density:

\[ f_\theta(\theta) = \int_0^\infty f_{r,\theta}(r,\theta)\,dr, \qquad F_\theta(\theta) = \int_{-\pi}^\theta f_\theta(u)\,du. \] Confidence bounds for angular uncertainty:

\[ F_\theta(\theta_{\text{lower}}) = \frac{\alpha}{2}, \quad F_\theta(\theta_{\text{upper}}) = 1 - \frac{\alpha}{2}. \]

Multivariate Reggsion

ggplot(rand_dat, mapping = aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector(alpha = .3, color = "black", normalize = FALSE) +
  geom_vector_smooth(eval_points =  data.frame(x = -90.8, y = 26.25), method = "lm", pi_type = "wedge")

Multivariate Reggsion

ggplot(rand_dat, aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector_smooth(n = 4, se.circle = TRUE, method = "lm") +
  geom_vector(alpha = .3, color = "black") 

Cokriging

Cokriging is a technique that extends ordinary kriging to jointly predict multiple, correlated spatial variables. It leverages the spatial cross-correlation between the variables to improve prediction accuracy.

Cokriging

ggplot(rand_dat, aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector_smooth(eval_points = data.frame(x = -90.8, y = 26.25), method = "kriging") +
  geom_vector(alpha = .3, color = "black") 
# Linear Model of Coregionalization found. Good.
# [using ordinary cokriging]

Generalized Additive Models (GAMs):

GAMs extend linear models by allowing non-linear relationships between predictors and the response via smooth functions. In spatial statistics, a GAM typically models the response as:

\[ y = \beta_0 + f(x,y) + \varepsilon, \]

where \(f(x,y)\) is a smooth function estimated from the data.

Generalized Additive Models (GAMs):

ggplot(rand_dat, aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector_smooth(eval_points = data.frame(x = -90.8, y = 26.25)) +
  geom_vector(alpha = .3, color = "black") 

Comparison

ggplot(rand_dat, aes(x = x, y = y, fx = fx, fy = fy)) +
  geom_vector_smooth(eval_points = data.frame(x = -90.8, y = 26.25), method = "gam") +
  geom_vector_smooth(eval_points = data.frame(x = -90.8, y = 26.25), method = "lm") +
  geom_vector_smooth(eval_points = data.frame(x = -90.8, y = 26.25), method = "kriging") +
  geom_vector(alpha = .3, color = "black") 
# Linear Model of Coregionalization found. Good.
# [using ordinary cokriging]

Multivariate Regression as an ODE

What if we saw this regression equation as a vector field? After all, it takes in x/y data and outputs fx/fy data?

vec_field <- function(u) {
  model <- lm(cbind(fx,fy)~x*y, data = rand_dat)
  newdata <- data.frame(x = u[1], y = u[2])
  as.numeric(predict(model, newdata = newdata))
}

ggplot() + 
  geom_vector(data = rand_dat, aes(x = x, y = y, fx = fx, fy = fy), alpha = .2) +
  geom_vector_field(data = rand_dat, fun = vec_field, mapping = aes(x = x, y = y, fx = fx, fy = fy))

Multivariate Regression as an ODE

ggplot(data = rand_dat) +
  geom_stream_smooth(aes(x = x, y = y, fx = fx, fy = fy))

Estimating Gradients from Scalar Data

ggplot(grid) +
  geom_raster(aes(x = x, y = y, fill = z)) +
  geom_point(data = sample_data, aes(x = x, y = y, size = z)) +
  scale_fill_gradient(low = "white", high = "black")

Estimating Gradients from Scalar Data

scalar_field <- function(u) {
  mod <- lm(z ~ x + y + poly(x,2) + poly(y,2), data = sample_data)
  pred <- predict(mod, newdata = data.frame(x = u[1], y = u[2]))
  as.numeric(pred)
}  

# plot the scalar data with the estimated gradient 
ggplot(grid) +
  geom_raster(aes(x = x, y = y, fill = z)) +
  scale_fill_gradient(low = "white", high = "black") +
  geom_gradient_field(fun = scalar_field) +
  scale_color_viridis_c()

Estimating Gradients from Scalar Data

ggplot(grid) +
  geom_raster(aes(x = x, y = y, fill = z)) +
  scale_fill_gradient(low = "white", high = "black") +
  geom_gradient_smooth(formula = ) +
  scale_color_viridis_c()

Estimating Gradients from Scalar Data

# Reading layer `Elev_Contour' from data source 
#   `C:\Users\Dusty_Turner1\OneDrive - Baylor University\r_projects\dissertation-turner\geom_vector_smooth\ELEV_Waco_W_TX_1X1_Shape\Shape\Elev_Contour.shp' 
#   using driver `ESRI Shapefile'
# Simple feature collection with 18500 features and 14 fields
# Geometry type: MULTILINESTRING
# Dimension:     XY
# Bounding box:  xmin: -98 ymin: 31 xmax: -97 ymax: 32
# Geodetic CRS:  NAD83
ggmap(basemap) +
  geom_sf(data = waco_tx, aes(colour = elevation), inherit.aes = FALSE, size = 0.3) +
  scale_color_viridis_c() +
  ggthemes::theme_map() +
  theme(legend.position = "right", legend.box = "vertical") 

Estimating Gradients from Scalar Data

map + geom_gradient_smooth(
    data = contour_with_elevation, 
    aes(x = X, y = Y, z = elevation),
    formula = z ~ x + y + poly(x,2) * poly(y,2)
  ) 

A new problem

Most ggplot2 Layers Expect Data

tibble(x = seq(-3,3,.1), y = dnorm(x)) |> 
  ggplot(aes(x = x, y = y)) +
  geom_line()

data <- tibble(x = seq(-3,3,.1), y = dnorm(x))

data |> 
  ggplot(aes(x = x, y = y)) +
  geom_line() +
  geom_area(data = data[data$x < qnorm(.95),])

Most ggplot2 Layers Expect Data

geom_point()
geom_line()
geom_segment()
geom_smooth()
geom_stream()

Some ggplot2 Layers Expect Functions

geom_function()
geom_stream_field()

Solution - ggfunction


Install from Github

remotes::install_github("dusty-turner/ggfunction")

Load the package

library("ggfunction")

\(\mathbb{R} \to \mathbb{R}\) Functions

ggplot() + geom_function(fun = dnorm, xlim=c(-3, 3))

\(\mathbb{R} \to \mathbb{R}\) Functions: Probability Distributions

Probability Density Function (PDF):

A PDF \(f_X(x)\) satisfies
\(\Pr(a \le X \le b) = \int_a^b f_X(x) \, dx, \quad \int_{-\infty}^{\infty} f_X(x) \, dx = 1\) For example, the standard normal PDF is:
\(f(x) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{x^2}{2}\right)\)

Example: Using ggfunction, you can plot the normal PDF with automated shading of a quantile region:

ggplot() +
  geom_pdf(fun = dnorm, xlim = c(-3, 3), p = 0.95, fill = "grey70")